Impact of learning set quality and size on decision tree performances
نویسندگان
چکیده
The quality of a decision tree is usually evaluated through its complexity and its generalization accuracy. Tree-simpliÞcation procedures aim at optimizing these two performance criteria. Among them, data reduction techniques differ from pruning by their simpliÞcation strategy. Actually, while pruning algorithms directly control tree size to combat the overÞtting problem, data reduction techniques perform a data preprocessing prior to decision tree construction to improve the learning set quality. Recent experimental results have shown that randomly manipulating training set size has a direct impact on tree size, and therefore recommend the use of the latter simpliÞcation strategy. In this paper, we provide theoretical arguments justifying data preprocessing in favor of tree simpliÞcation. We also investigate new data reduction techniques, usually used in the Þeld of prototype selection. From experiments with 22 datasets, we show that some of them are very efficient to improve standard post-pruning performances.
منابع مشابه
بررسی کارایی مدل درختان تصمیمگیری در برآورد رسوبات معلق رودخانهای (مطالعه موردی: حوضه سد ایلام)
The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...
متن کاملEvaluation of liquefaction potential based on CPT results using C4.5 decision tree
The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...
متن کاملAnomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors
Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملGenerating Better Decision Trees
A new decision tree learning algorithm called IDX is described. More general than existing algorithms, IDX addresses issues of decision tree quality largely overlooked in the artificial intelligence and machine learning literature. Decision tree size, error rate, and expected classification cost are just a few of the quality measures it can exploit. Furthermore, decision trees of varying qualit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Syst. Signal
دوره 1 شماره
صفحات -
تاریخ انتشار 2000